Aiming at the problems of large area and large consumption of fresh randomness in Threshold Implementation (TI) of SM4, an improved threshold implementation scheme of SM4 was proposed. In the case of satisfying the threshold implementation theory, the operation of S-box nonlinear inversion was shared with no fresh randomness, and a domain-oriented multiplication mask scheme was introduced to reduce the fresh randomness consumption of S-box to 12 bits. Based on the idea of the pipeline, a new SM4 serial architecture with 8-bit data width was designed. The threshold implementation of S-box was reused, and the linear function of SM4 was optimized to make the area of threshold implementation of SM4 more compact, only 6 513 GE. In comparison with the TI scheme of SM4 with 128-bit data width, the area of the proposed scheme is reduced by more than 63.7%, and there is a better trade-off between speed and area. The side-channel experimental results show that the proposed scheme has the capability of anti-first-order Differential Power Analysis (DPA).